The NPD Benchmark: Reality Check for OBDA Systems
نویسندگان
چکیده
In the last decades we moved from a world in which an enterprise had one central database—rather small for todays’ standards—to a world in which many different—and big—databases must interact and operate, providing the user an integrated and understandable view of the data. Ontology-Based Data Access (OBDA) is becoming a popular approach to cope with this new scenario. OBDA separates the user from the data sources by means of a conceptual view of the data (ontology) that provides clients with a convenient query vocabulary. The ontology is connected to the data sources through a declarative specification given in terms of mappings. Although prototype OBDA systems providing the ability to answer SPARQL queries over the ontology are available, a significant challenge remains when it comes to use these systems in industrial environments: performance. To properly evaluate OBDA systems, benchmarks tailored towards the requirements in this setting are needed. In this work, we propose a novel benchmark for OBDA systems based on real data coming from the oil industry: the Norwegian Petroleum Directorate (NPD) FactPages. Our benchmark comes with novel techniques to generate, from the NPD data, datasets of increasing size, taking into account the requirements dictated by the OBDA setting. We validate our benchmark on significant OBDA systems, showing that it is more adequate than previous benchmarks not tailored for OBDA.
منابع مشابه
The NPD Benchmark for OBDA Systems
In Ontology-Based Data Access (OBDA), queries are posed over a high-level conceptual view, and then translated into queries over a potentially very large (usually relational) data source. The ontology is connected to the data sources through a declarative specification given in terms of mappings. Although prototype OBDA systems providing the ability to answer SPARQL queries over the ontology ar...
متن کامل10 th International Workshop on Scalable Semantic Web Knowledge Base Systems ( SSWS 2014
RDFox is a new materialisation-based RDF system currently being developed at Oxford University. The system is currently RAM-based, and its algorithms have been designed to take full advantage of modern multi-core/processor systems. In my talk I will present an overview of some of the techniques we developed in the context of the RDFox project. In particular, I will discuss our algorithm that pa...
متن کاملData Scaling in OBDA Benchmarks: The VIG Approach
In this paper we describe VIG, a data scaler for benchmarks in the context of ontology-based data access (OBDA). Data scaling is a relatively recent approach, proposed in the database community, that allows for quickly scaling up an input data instance to s times its size, while preserving certain applicationspecific characteristics. The advantage of the approach is that the user is not require...
متن کاملR2RML Mappings in OBDA Systems: Enabling Comparison among OBDA Tools
In today’s large enterprises there is a significant increasing trend in the amount of data that has to be stored and processed. To complicate this scenario the complexity of organizing and managing a large collection of data, structured according to a single, unified schema, makes so that there is almost never a single place where to look to satisfy an information need. The Ontology-Based Data ...
متن کاملFast and Simple Data Scaling for OBDA Benchmarks
In this paper we describe VIG, a data scaler for OBDA benchmarks. Data scaling is a relatively recent approach, proposed in the database community, that allows for quickly scaling an input data instance to n times its size, while preserving certain application-specific characteristics. The advantages of the scaling approach are that the same generator is general, in the sense that it can be re-...
متن کامل